NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling

https://doi.org/10.1093/bib/bbaf212

Zhai, Jingchen; Qi, Xiguang; Cai, Lianjin; Liu, Yue; Tang, Haocheng; Xie, Lei; Wang, Junmei (May 2025, Briefings in Bioinformatics)

Abstract Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.’s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.
more » « less
AI-driven multi-omics integration for multi-scale predictive modeling of genotype-environment-phenotype relationships

https://doi.org/10.1016/j.csbj.2024.12.030

Wu, You; Xie, Lei (January 2025, Computational and Structural Biotechnology Journal)

Full Text Available
Semi-supervised meta-learning elucidates understudied molecular interactions

https://doi.org/10.1038/s42003-024-06797-z

Wu, You; Xie, Li; Liu, Yang; Xie, Lei (September 2024, Communications Biology)
Audiovisual Multimodal Cough Data Analysis for Tuberculosis Detection

https://doi.org/10.1109/IISA62523.2024.10786619

Yadav, Jyoti; Varde, Aparna S; Liu, Hao; Antoniou, George; Xie, Lei (July 2024, IEEE)

Full Text Available
Comprehensive cough data analysis on CODA TB

https://doi.org/10.1109/BigData59044.2023.10386805

Yadav, Jyoti; Varde, Aparna S; Xie, Lei (December 2023, IEEE)

Full Text Available
Rhythmic RFID Authentication

https://doi.org/10.1109/TNET.2022.3204204

Li, Jiawei; Wang, Chuyu; Li, Ang; Han, Dianqi; Zhang, Yan; Zuo, Jinhang; Zhang, Rui; Xie, Lei; Zhang, Yanchao (September 2022, IEEE/ACM Transactions on Networking)

Passive RFID technology is widely used in user authentication and access control. We propose RF-Rhythm, a secure and usable two-factor RFID authentication system with strong resilience to lost/stolen/cloned RFID cards. In RF-Rhythm, each legitimate user performs a sequence of taps on his/her RFID card according to a self-chosen secret melody. Such rhythmic taps can induce phase changes in the backscattered signals, which the RFID reader can detect to recover the user’s tapping rhythm. In addition to verifying the RFID card’s identification information as usual, the backend server compares the extracted tapping rhythm with what it acquires in the user enrollment phase. The user passes authentication checks if and only if both verifications succeed. We also propose a novel phase-hopping protocol in which the RFID reader emits Continuous Wave (CW) with random phases for extracting the user’s secret tapping rhythm. Our protocol can prevent a capable adversary from extracting and then replaying a legitimate tapping rhythm from sniffed RFID signals. Comprehensive user experiments confirm the high security and usability of RF-Rhythm with false-positive and false-negative rates close to zero.
more » « less
Full Text Available
End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins

https://doi.org/10.1371/journal.pcbi.1010851

Cai, Tian; Xie, Li; Zhang, Shuo; Chen, Muge; He, Di; Badkul, Amitesh; Liu, Yang; Namballa, Hari Krishna; Dorogan, Michael; Harding, Wayne W.; et al (January 2023, PLOS Computational Biology)
Skolnick, Jeffrey (Ed.)
Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.
more » « less
Full Text Available
RF-Rhythm: Secure and Usable Two-Factor RFID Authentication

https://doi.org/10.1109/INFOCOM41043.2020.9155427

Li, Jiawei; Wang, Chuyu; Li, Ang; Han, Dianqi; Zhang, Yan; Zuo, Jinhang; Zhang, Rui; Xie, Lei; Zhang, Yanchao (July 2020, IEEE International Conference on Computer Communications (INFOCOM))
null (Ed.)
Full Text Available
The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: bioinformatics towards translational applications

https://doi.org/10.1186/s12859-018-2460-3

Liu, Xiaoming; Xie, Lei; Wu, Zhijin; Wang, Kai; Zhao, Zhongming; Ruan, Jianhua; Zhi, Degui (December 2018, BMC Bioinformatics)

Full Text Available
Cross-Dependency Inference in Multi-Layered Networks: A Collaborative Filtering Perspective

https://doi.org/10.1145/3056562

Chen, Chen; Tong, Hanghang; Xie, Lei; Ying, Lei; He, Qing (August 2017, ACM Transactions on Knowledge Discovery from Data)

Full Text Available

« Prev Next »

Search for: All records